Predictive analysis uses historical data to predict future events. Typically, historical data is used to create a model that captures various temporal patterns. This predictive model is then used with current data to predict what will happen. There are various methodologies to carry out predictions, in this workshop we opted for the Bayesian approach to carry it out
The Bayesian approach, in the context of predictive analysis, focuses its analysis on the calculation of the posterior predictive distribution. This distribution uses the information in the data as well as the posterior distribution of the parameters of interest. Using these inputs the predictive distribution can generate future data; that is, missing data in a determining horizon.
\[\begin{align*} f(y_{pred}|y_{obs}) = \int_{\theta} f(y_{pred},\theta|y_{obs})d\theta =\int_{\theta} f(y_{pred}| \theta y_{obs})f(\theta|y_{obs})d\theta \end{align*}\]
Currently there are various methodologies to carry out the numerical calculation of this distribution which in R correspond to the use of a certain package. Some widely known packages are as follows:
Relatively not long ago the methodology of the Laplace Integrated Nested Approach or INLA (for its acronym in English) has gained importance for its relative efficiency to estimate complex models.
Prediction analysis in practice can be understood as predicting unknown data in a certain horizon. In this sense, these future data can be interpreted as missing data or missings values, which are the object of the prediction.
| Year | week | Number of deaths |
|---|---|---|
| 2019 | 1 | 0 |
| 2019 | 2 | 1 |
| 2019 | 3 | 4 |
| 2020 | 1 | 0 |
| 2020 | 2 | 1 |
| 2020 | 3 | 4 |
| 2021 | 1 | NA |
| 2021 | 2 | NA |
| 2021 | 3 | NA |
To carry out predictions using INLA, an approximation is exactly the previous one. Once the data are arranged as in the previous table, it is enough to fit a typical model in INLA, since it will internally calculate the posterior predictive distribution.
library(kableExtra)
library(tidyverse)
library(INLA)
library(yardstick)
library(gt)
library(innovar)
library(spdep)
library(reshape2)
db <- readRDS("db_excess_proc.rds")
db.lima <- db %>%
filter(prov=="LIMA")
db.lima %>%
ggplot()+
geom_line(aes(x=date,y=n),color="#011f4b",lwd=1.5)+
geom_point(aes(x=date,y=n),color="black",lwd=2.5)+
ylab("Numero de muertes")+
xlab("")+
ggtitle("Evolution of the number of deaths")+
theme_bw()+
theme(plot.title = element_text(hjust = 0.5,size = 35),
strip.background =element_rect(fill="#011f4b"),
strip.text = element_text(color = "white",face = "bold",size = "5pts")) To carry out the adjustment of models we use the inla() function, which has several parameters, some of the most important are :
data : An object typically of the class data.frame, data to adjust any model.
formula : Un objeto de la clase formula que especifica la ecuacion que pretendemos ajustar como por ejemplo inla(target ~ 1 + x1). En la formula podemos especificar efectos lineales (introduciendo el nombre de la variable) o no lineales (empleando f()).
verbose : A variable of the type boolean, which indicates if you want to show the convergence process in the console
lima.m.m0 <- inla(n ~ 1 + annus,
verbose = F,
data = db.lima
)The parameters detailed above are the essential ones to execute the adjustment of a model using INLA. However, some extra parameters to consider are the following:
family: Class object character. This parameter is crucial, as it determines the distribution of the target variable, by default it is infamily = Gaussian.control.compute:Object of class list allows to specify the calculation of information criteria such asaic, dic,waic.lima.m.m1 <- inla(n ~ 1 + annus,
verbose = F,
data = db.lima,
control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
control.predictor = list(link = 1),
)In forecast analysis, data in the format of time series is typically used to carry it out. For this, it is commonly assumed that the time series can be decomposed as follows:
\[\begin{align*} y_{t} = S_{t}+T_{t}+e_{t} \end{align*}\]
donde :
\(S_{t}\) : Seasonality
\(T_{t}\) : Trend
\(e_{t}\) : white noise
The components detailed above can be modeled in the context of INLA using a priori distributions for each component such as the following:
variablef(variable, model = "ar1")f(variable, model = "rw1")f(variable, model = "rw2")Given the behavior exhibited by the series that we intend to model, for the components:
Taking into account these brief suggestions, we proceed to adjust models for this, firstly, we establish as missing data the period we wish to predict in this case 2019.
db <- db %>%
mutate(n=ifelse(annus == 2019 & prov=="LIMA" ,NA,n))
db.n <- db.lima %>%
mutate(n=ifelse(annus == 2019,NA,n)) Subsequently, again we use the inla () function with all the complementary parameters that we detail. However, an important point to consider is that the process we model is a counting process therefore the assumption regarding the distribution that we will use will be that of a negative binomial family =" nbinomial ". It is worth mentioning that we can see the complete list of possible distributions for the response variable using: inla.list.models (" likelihood ")
# Assuming an a priori Distribution for the years: "ar1"
lima.m.m1 <- inla(n ~ 1 + temperature + f(annus,model = "ar1"),
control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
control.predictor = list(link = 1),
verbose = F,
family = "nbinomial",
data = db.n
)
# Assuming an a priori Distribution for the weeks: "rw1"
lima.m.m2 <- inla(n ~ 1 + temperature+ f(week,model = "rw1"),
control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
control.predictor = list( link = 1),
verbose = F,
family = "nbinomial",
data = db.n
)
# both priors at same time
lima.m.m3 <- inla(n ~ 1 + temperature + f(annus,model = "ar1") + f(week,model = "rw1"),
control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
control.predictor = list( link = 1),
verbose = F,
family = "nbinomial",
data = db.n
)Once the model is fitted , we can see the behavior of the variables considered, so to obtain a summary of the behavior of the linear variables we can use InlaObject$summary.fixed while for the non-linear InlaObject$summary.fixed.Finally, we use kbl() from the kable package together with kable_styling() from thekableExtra package to stylize the result.
lima.m.m1$summary.fixed %>% kbl() %>%
kable_styling()| mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
|---|---|---|---|---|---|---|---|
| (Intercept) | 25.1163025 | 1.9149318 | 21.3445050 | 25.1177697 | 28.875289 | 25.1206226 | 7e-07 |
| temperature | -0.6475524 | 0.0656101 | -0.7764705 | -0.6476078 | -0.518465 | -0.6477043 | 7e-07 |
lima.m.m2$summary.fixed %>% kbl() %>%
kable_styling()| mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
|---|---|---|---|---|---|---|---|
| (Intercept) | 19.5100231 | 4.7444553 | 9.2622663 | 19.8927111 | 27.7492250 | 20.8160992 | 2.92e-05 |
| temperature | -0.4553711 | 0.1625714 | -0.7382051 | -0.4685067 | -0.1038017 | -0.5001262 | 2.92e-05 |
lima.m.m3$summary.fixed %>% kbl() %>%
kable_styling()| mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
|---|---|---|---|---|---|---|---|
| (Intercept) | 5.2479172 | 5.0241032 | -4.7853686 | 5.3147192 | 14.9081302 | 5.4553337 | 3e-07 |
| temperature | 0.0332091 | 0.1721467 | -0.2972065 | 0.0309164 | 0.3767154 | 0.0261007 | 3e-07 |
#one non-linear effect
lima.m.m1$summary.random %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| annus.ID | annus.mean | annus.sd | annus.0.025quant | annus.0.5quant | annus.0.975quant | annus.mode | annus.kld |
|---|---|---|---|---|---|---|---|
| 2017 | -0.0576854 | 0.0328859 | -0.1332144 | -0.0565519 | 0.0116763 | -0.0549531 | 0e+00 |
| 2018 | 0.0576854 | 0.0328859 | -0.0117264 | 0.0565649 | 0.1332459 | 0.0549531 | 0e+00 |
| 2019 | -0.0256657 | 0.0541348 | -0.1219166 | -0.0341827 | 0.1014338 | -0.0490460 | 1e-07 |
lima.m.m2$summary.random %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| week.ID | week.mean | week.sd | week.0.025quant | week.0.5quant | week.0.975quant | week.mode | week.kld |
|---|---|---|---|---|---|---|---|
| 48 | 0.0314752 | 0.0267087 | -0.0207982 | 0.0310096 | 0.0853506 | 0.0294569 | 7.7e-06 |
| 49 | 0.0317820 | 0.0272356 | -0.0213636 | 0.0312798 | 0.0868094 | 0.0296900 | 6.5e-06 |
| 50 | 0.0310421 | 0.0284228 | -0.0248779 | 0.0306129 | 0.0882638 | 0.0291008 | 5.1e-06 |
| 51 | 0.0290670 | 0.0304697 | -0.0321229 | 0.0288937 | 0.0897392 | 0.0277042 | 3.9e-06 |
| 52 | 0.0286295 | 0.0334698 | -0.0392944 | 0.0285249 | 0.0952601 | 0.0272482 | 2.8e-06 |
# two or more non-linear effects
lima.m.m3$summary.random$annus %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| ID | mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld |
|---|---|---|---|---|---|---|---|
| 2017 | -0.0720041 | 0.0386745 | -0.1608820 | -0.0713316 | 0.0133809 | -0.0703285 | 0e+00 |
| 2018 | 0.0720042 | 0.0386745 | -0.0134202 | 0.0713291 | 0.1609140 | 0.0703285 | 0e+00 |
| 2019 | -0.0331967 | 0.0658967 | -0.1463806 | -0.0455443 | 0.1248037 | -0.0643716 | 3e-07 |
lima.m.m3$summary.random$week %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| ID | mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld |
|---|---|---|---|---|---|---|---|
| 48 | 0.0311491 | 0.0296311 | -0.0274780 | 0.0312121 | 0.0893815 | 0.0313638 | 9e-07 |
| 49 | 0.0225951 | 0.0297055 | -0.0365835 | 0.0228147 | 0.0805265 | 0.0232713 | 9e-07 |
| 50 | 0.0149695 | 0.0307703 | -0.0464340 | 0.0152485 | 0.0748035 | 0.0158254 | 9e-07 |
| 51 | 0.0023910 | 0.0336193 | -0.0649645 | 0.0028200 | 0.0673782 | 0.0037159 | 9e-07 |
| 52 | -0.0002577 | 0.0385124 | -0.0774016 | 0.0002249 | 0.0742427 | 0.0012494 | 9e-07 |
In addition, once the models are adjusted, we can collect relevant information about their adjustment. In this sense, the object generated by the inla() function is a list that contains all the results which correspond to the options (parameters) used during the execution of inla(). In this case we use InlaObject$summary.fitted.values to collect information about the adjusted values.
fit.m.m1 <- lima.m.m1$summary.fitted.values$mean
fit.m.m2 <- lima.m.m2$summary.fitted.values$mean
fit.m.m3 <- lima.m.m3$summary.fitted.values$mean
n.m <- db.lima$nLikewise, we proceed to assign all the information of the fitted values of the models to a list object. Later we use melt to create a database structure of thelong type suitable for graphing. Thus, we collect information on the credibility intervals of the adjusted values using: lima.m.m1$summary.fitted.values$ 0.025 quant and lima.m.m1$summary.fitted.values$0.975quant.
datos <- list("AR(1)"=fit.m.m1,"RW(1)"=fit.m.m2,"AR(1)RW(1)"=fit.m.m3) %>%
as.data.frame() %>%
melt( variable.name = "modelo",
value.name = "fit") %>%
group_by(modelo) %>%
mutate(actual = n.m,
date = db.n$date) %>%
ungroup() %>%
mutate(min_inter = c(lima.m.m1$summary.fitted.values$`0.025quant`,
lima.m.m2$summary.fitted.values$`0.025quant`,
lima.m.m3$summary.fitted.values$`0.025quant`),
max_inter = c(lima.m.m1$summary.fitted.values$`0.975quant`,
lima.m.m2$summary.fitted.values$`0.975quant`,
lima.m.m3$summary.fitted.values$`0.975quant`)
)Subsequently, we graph the data as well as the fit of the proposed models with their respective credibility interval using the ggplot() function of the ggplot2 package.
datos %>%
ggplot() +
geom_line(aes(x=date,y=actual),lwd=1.5) +
geom_line(aes(x=date,y=fit),lwd=1.5,color="#011f4b",linetype = "dashed") +
geom_ribbon(aes(x=date,y=fit,ymin=min_inter, ymax=max_inter),
alpha=0.2, #transparency
fill="#011f4b",
color="#011f4b",lwd=1.5) +
facet_wrap(vars(modelo)) +
theme_bw()+
xlab(" ") +
ylab("Numero de casos")+
geom_vline(xintercept = as.Date("2019-01-07"), linetype="dotted",
color = "black", size=3)+
theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
strip.background =element_rect(fill="#011f4b"),
strip.text = element_text(color = "white",face = "bold",size = "35")) For the calculation of metrics in the search to evaluate the proposed models we will use the package yardstick. In particular, we use the metric_set function to select the metrics we want to calculate (mae,mape,mpe,rmse,msd). We group the data for each model to perform the calculation at that level, we filter by the year of analysis of the prediction 2019 and finally we use the function perform.metrics, for which we detail the parameterstruth and estimate that collect the true and adjusted data, respectively.
#Metrics to use
perform.metrics <- metric_set(mae,mape,mpe,rmse,msd)
# Calculation of metrics in the forecast year
tbl.yrd.m <- datos %>%
filter(date>=as.Date("2019-01-07")) %>%
group_by(modelo) %>%
perform.metrics(truth = actual,
estimate = fit)Finally, the resulting object has a long type structure, so for the purposes of reporting these results in a table. In the first place the data.frame structure is converted towide using the pivot_wider function which collects as new column names (names_from) the levels of the .metric variable and these in turn are filled with the values (values_from) from.estimate. Finally, we use the gt() function from the gt package to report a custom table.
# Results table
tbl.yrd.m %>%
pivot_wider(id_cols = modelo,
names_from = .metric,
values_from = .estimate) %>%
gt() %>%
tab_header(
title = md("out-of-sample accuracy metrics")#,
#subtitle ="out-of-sample accuracy metrics"
) %>%
data_color(
columns = vars(mae,mape,mpe,rmse,msd),
colors = scales::col_numeric(
palette = c(
"#011f4b","white"),
domain = NULL)
) %>% tab_footnote(
footnote = "mae = mean absolute error",
locations = cells_column_labels(columns = mae)
) %>% tab_footnote(
footnote = "mape = Mean absolute percent error",
locations = cells_column_labels(columns = mape))%>%
tab_footnote(
footnote = "mpe = Mean percentage error",
locations = cells_column_labels(columns = mpe))%>%
tab_footnote(
footnote = "rsme = Root square mean error",
locations = cells_column_labels(columns = rmse))%>%
tab_footnote(
footnote = "msd = Mean signed deviation",
locations = cells_column_labels(columns = msd))| out-of-sample accuracy metrics | |||||
|---|---|---|---|---|---|
| modelo | mae1 | mape2 | mpe3 | rmse4 | msd5 |
| AR.1. | 89.32137 | 15.12392 | 14.90633 | 105.10674 | 88.29872 |
| RW.1. | 74.96485 | 12.76259 | 12.61102 | 91.60271 | 74.23950 |
| AR.1.RW.1. | 89.23152 | 15.29011 | 15.29011 | 101.74050 | 89.23152 |
|
1
mae = mean absolute error
2
mape = Mean absolute percent error
3
mpe = Mean percentage error
4
rsme = Root square mean error
5
msd = Mean signed deviation
|
|||||
Of the temporal structures, the best model in terms of out-of-sample prediction (forecasting out-of-sample) was the model that assumes a random walk of order 1 for said structure rw1.
The spatio-temporal approach implies complementing the previous approach, making the model more complex, taking into account the spatial dimension. For this we need to include spatial random effects to control for such analysis dimension. It is worth mentioning that areal spatial models using INLA, necessarily require the creation of a graph or a spatial data structure that captures the arrangement of the areas as a matrix of spatial weights.
\[\begin{align*} y_{t} = S_{t}+T_{t}+\nu_{t}+u_{t}+e_{t} \end{align*}\]
donde:
\(\nu_{t}\) : structured effects
\(u_{t}\) : unstructured effects
The 2 new components \(\nu_ {t},u_{t}\) are intended to take into account in the prediction the spatial correlation present in the data. About :
The unstructured effects correspond to random effects that attempt to control unobserved characteristics of each area under study. To model them we can use f (area, model =" iid ")
Structured effects are random effects that explicitly take spatial structure into account. They can be modeled in INLA in various ways::
f(area, model = "besag")f(area, model = "besagproper")Given the above, we collect the geographic information of Peru expressed in a shapefile, at the provincial level. For which, in addition to calling the shapefile, by grouping the data at the provincial level and using the summarize function, seeking to condense the polygons of the shapefile to the desired level
data("Peru")
peru.prov<-Peru %>%
group_by(reg,prov) %>%
summarise()Subsequently, we create a neighborhood structure from the shapefile, and in turn, from said neighborhood structure, we create the spatial weight matrix
# Creando la estructura de vecinos mas cercanos
peru.adj <- poly2nb(peru.prov)
# Pesos espaciales
W.peru <- nb2mat(peru.adj, style = "W") Finally, we carry out the cleaning of non-existent provinces; as well as the creation of an id for each province, in order to identify each polygon. The latter represents an extra requirement to fit spatial models in INLA
db.m.sp<- db %>%
group_by(prov) %>%
filter(!prov %in% c("EXTRANJERO","ARICA")) %>%
mutate(id.sp=cur_group_id())We adjust the model, following the steps indicated above
peru.m.m5 <- inla(n ~ 1 + f(annus,model = "linear") +
f(week,model="rw1") +
f(id.sp, model = "besag",
graph=W.peru)+
temperature,
data = db.m.sp,
control.compute = list(dic = TRUE,
waic = TRUE,
cpo = TRUE),
control.predictor = list(link = 1)
)Now we can see the behavior of our control variables
peru.m.m5$summary.fixed %>% kbl() %>%
kable_styling()| mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld | |
|---|---|---|---|---|---|---|---|
| (Intercept) | -909.9044189 | 69.9206924 | -1047.1823065 | -909.9063798 | -772.7410811 | -909.9044189 | 0 |
| temperature | -4.4377112 | 0.4281277 | -5.2782704 | -4.4377233 | -3.5978537 | -4.4377112 | 0 |
| annus | 0.5188985 | 0.0356266 | 0.4489514 | 0.5188975 | 0.5887873 | 0.5188985 | 0 |
# two ore more non-linear effects
peru.m.m5$summary.random$week %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| ID | mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld |
|---|---|---|---|---|---|---|---|
| 48 | 0.2129252 | 0.1170359 | -0.0168558 | 0.2129219 | 0.4425144 | 0.2129252 | 0 |
| 49 | 0.2450635 | 0.1181471 | 0.0131008 | 0.2450602 | 0.4768326 | 0.2450635 | 0 |
| 50 | 0.3003677 | 0.1208096 | 0.0631777 | 0.3003643 | 0.5373597 | 0.3003677 | 0 |
| 51 | 0.3250225 | 0.1281385 | 0.0734434 | 0.3250189 | 0.5763917 | 0.3250225 | 0 |
| 52 | 0.3849953 | 0.1478452 | 0.0947254 | 0.3849912 | 0.6750230 | 0.3849953 | 0 |
peru.m.m5$summary.random$id.sp %>% as.data.frame() %>% slice_tail(n = 5) %>% kbl() %>%
kable_styling()| ID | mean | sd | 0.025quant | 0.5quant | 0.975quant | mode | kld |
|---|---|---|---|---|---|---|---|
| 192 | -12.193491 | 0.6163909 | -13.403674 | -12.193509 | -10.984318 | -12.193491 | 0 |
| 193 | -11.430281 | 0.5141140 | -12.439660 | -11.430295 | -10.421744 | -11.430281 | 0 |
| 194 | -7.413695 | 0.4995163 | -8.394414 | -7.413709 | -6.433794 | -7.413695 | 0 |
| 195 | -8.405336 | 0.4818823 | -9.351434 | -8.405350 | -7.460028 | -8.405336 | 0 |
| 196 | -2.432754 | 0.6351244 | -3.679717 | -2.432772 | -1.186831 | -2.432754 | 0 |
Again we collect the information of the object resulting from using inla()
db.m.sp.fit <- db.m.sp %>%
ungroup() %>%
mutate(fit =peru.m.m5$summary.fitted.values$mean,
min_inter=peru.m.m5$summary.fitted.values$`0.025quant`,
max_inter=peru.m.m5$summary.fitted.values$`0.975quant`)
db.lima.m.fit <- db.m.sp.fit %>%
filter(prov=="LIMA")
fit.m.m1 <- lima.m.m1$summary.fitted.values$mean
fit.m.m2 <- lima.m.m2$summary.fitted.values$mean
fit.m.m3 <- lima.m.m3$summary.fitted.values$mean
fit.m.m4 <- db.lima.m.fit$fit
min.m.m4 <- db.lima.m.fit$min_inter
max.m.m4 <- db.lima.m.fit$max_inter
n.m <- db.lima$n
datos.m <- list("AR(1)"=fit.m.m1,"RW(1)"=fit.m.m2,
"AR(1)RW(1)"=fit.m.m3,"AR(1)RW(1) with Spatial effects"=fit.m.m4) %>%
as.data.frame() %>%
melt( variable.name = "modelo",
value.name = "fit") %>%
group_by(modelo) %>%
mutate(actual = n.m,
date = db.n$date) %>%
ungroup() %>%
mutate(min_inter = c(lima.m.m1$summary.fitted.values$`0.025quant`,
lima.m.m2$summary.fitted.values$`0.025quant`,
lima.m.m3$summary.fitted.values$`0.025quant`,
min.m.m4),
max_inter = c(lima.m.m1$summary.fitted.values$`0.975quant`,
lima.m.m2$summary.fitted.values$`0.975quant`,
lima.m.m3$summary.fitted.values$`0.975quant`,
max.m.m4)
) and again we plot the results of the model to compare its fit with the true data.
Finally, we follow the same steps as in the temporal focus section.
#Metrics to use
perform.metrics <- metric_set(mae,mape,mpe,rmse,msd)
# Calculation of metrics in the forecast year
tbl.yrd.m <- datos.m %>%
filter(date>=as.Date("2019-01-07")) %>%
group_by(modelo) %>%
perform.metrics(truth = actual,
estimate = fit)
#results table
tbl.yrd.m %>%
pivot_wider(id_cols = modelo,
names_from = .metric,
values_from = .estimate) %>%
gt() %>%
tab_header(
title = md("out-of-sample accuracy metrics"),
#subtitle ="out-of-sample accuracy metrics"
) %>%
data_color(
columns = vars(mae,mape,mpe,rmse,msd),
colors = scales::col_numeric(
palette = c(
"#011f4b","white"),
domain = NULL)
)%>% tab_footnote(
footnote = "mae = mean absolute error",
locations = cells_column_labels(columns = mae)
) %>% tab_footnote(
footnote = "mape = Mean absolute percent error",
locations = cells_column_labels(columns = mape))%>%
tab_footnote(
footnote = "mpe = Mean percentage error",
locations = cells_column_labels(columns = mpe))%>%
tab_footnote(
footnote = "rsme = Root square mean error",
locations = cells_column_labels(columns = rmse))%>%
tab_footnote(
footnote = "msd = Mean signed deviation",
locations = cells_column_labels(columns = msd))| out-of-sample accuracy metrics | |||||
|---|---|---|---|---|---|
| modelo | mae1 | mape2 | mpe3 | rmse4 | msd5 |
| AR.1. | 89.32137 | 15.12392 | 14.90633 | 105.10674 | 88.29872 |
| RW.1. | 74.96485 | 12.76259 | 12.61102 | 91.60271 | 74.23950 |
| AR.1.RW.1. | 89.23152 | 15.29011 | 15.29011 | 101.74050 | 89.23152 |
| AR.1.RW.1..with.Spatial.effects | 76.11171 | 12.61409 | 11.93369 | 87.56624 | 72.87075 |
|
1
mae = mean absolute error
2
mape = Mean absolute percent error
3
mpe = Mean percentage error
4
rsme = Root square mean error
5
msd = Mean signed deviation
|
|||||
The spatial model, as can be seen, although it does not collect the temporal patterns quite well, on average in terms of precision it is the best resulting model since it not only obtains predictions with more limited credibility intervals but also surpasses the other models in terms of the out-of-sample prediction. Finally, it is worth mentioning that these results can be widely improved with the adequate specification of the linear predictor using a priori structures more appropriate to the data or perhaps other more complex ones, as well as considering other control variables.